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■^^ . Abstract 

|^^_ ■ Dynamic task allocation is an essential requirement for multi-robot 

fvj ' systems operating in unknown dynamic environments. It allows robots to 

, change their behavior in response to environmental changes or actions of 

I !■ other robots in order to improve overall system performance. Emergent 

C_J ' coordination algorithms for task allocation that use only local sensing and 

^^ , no direct communication between robots are attractive because they are 

• ' robust and scalable. However, a lack of formal analysis tools makes emer- 

»j ' gent coordination algorithms difficult to design. In this paper we present 

I I. a mathematical model of a general dynamic task allocation mechanism. 

' Robots using this mechanism have to choose between two types of task, 

" I and the goal is to achieve a desired task division in the absence of ex- 

'^ ■ plicit communication and global knowledge. Robots estimate the state of 

. ' the environment from repeated local observations and decide which task 

I , to choose based on these observations. We model the robots and obser- 

^^ ■ vations as stochastic processes and study the dynamics of the collective 

^S ' behavior. Specifically, we analyze the effect that the number of obser- 

\^ , vations and the choice of the decision function have on the performance 

f^ ' of the system. The mathematical models are validated in a multi-robot 

^^ I multi-foraging scenario. The model's predictions agree very closely with 

^ . experimental results from sensor-based simulations. 

_>: 

^ '. 1 Introduction 

. y^ , In the 1980's it was considered ground-breaking for a mobile robot to move 

around an unstructured environment at reasonable speeds. In the years since, 
advancements in both hardware mechanisms and software architectures and 
algorithms have resulted in quite capable mobile robot systems. Provided with 
this baseline competency of individual robots, increasing attention has been paid 
to the study of Multi-Robot Systems (MRS), and in particular distributed MRS 
with which the remainder of this paper is concerned. In a distributed MRS there 
is no centralized control mechanism - instead, each robot operates independently 
under local sensing and control, with coordinated system-level behavior arising 
from local interactions among the robots and between the robots and the task 
environment. The effective design of coordinated MRS is restricted by the lack 
of formal design tools and methodologies. The design of single robot systems 



(SRS) has greatly benefited from the formahsms provided by control theory - 
the design of MRS is in need of analogous formalisms. 

For a group of robots to effectively perform a given system-level task, the 
designer must address the question of which robot should do which task and 
when g]. The process of assigning individual robots to sub-tasks of a given 
system-level task is called task allocation, and it is a key functionality required 
of any MRS. Dynamic task allocation is a class of task allocation in which 
the assignment of robots to sub-tasks is a dynamic process and may need to 
be continuously adjusted in response to changes in the task environment or 
group performance. The problem of task allocation in a distributed MRS is 
further compounded by the fact that task allocation must occur as a result 
of a distributed process as there is no central coordinator available to make 
task assignments. This increases the problem's complexity because, due to the 
local sensing of each robot, no robot has a complete view of the world state. 
Given this incomplete and often noisy information, each robot must make local 
control decisions about which actions to perform and when, without complete 
knowledge of what other robots have done in the past, are doing now, or will 
do in the future. 

There are a number of task allocation models and philosophies. Historically, 
the most popular approaches rely on intentional coordination to achieve task 
allocation J2J| • In those, the robots coordinate their respective actions explicitly 
through deliberate communications and negotiations. Due to scaling issues, such 
approaches are primarily used in MRS consisting of a relatively small number 
of robots (i.e., fewer than 10). Task allocation through intentional coordination 
remains the preferred approach because it is better understood, easier to design 
and implement, and more amenable to formal analysis 0]. 

As the size of the MRS grows, the complexity of the design of intentional ap- 
proaches increases due to increased demands in communication bandwidth and 
computational abilities of individual robots. Furthermore, complexity intro- 
duced by increased robot interactions makes such systems much more difficult 
to analyze and design. This leads to the alternative to intentional coordination, 
namely, task allocation through utilizing emergent coordination. In systems 
using emergent coordination, individual robots coordinate their actions based 
solely on local sensing information and local interactions. Typically, there is 
very little or no direct communication or explicit negotiations between robots. 
They are, therefore, more scalable to larger numbers of robots and are more 
able to take advantage of the robustness and parallelism provided by the aggre- 
gation of large numbers of coordinated robots. The drawback of task allocation 
as achieved through emergent coordination mechanisms is that such systems 
can be difficult to design, solutions are commonly sub-optimal, and since co- 
ordination is achieved through many simultaneous local interactions between 
various subsets of robots, predictive analysis of expected system performance is 
difficult. 

As MRS composed of ever-larger numbers of robots become available, the 
need for task allocation through emergent coordination will increase. To address 
the lack of formalisms in the design of such MRS, in this article we present 



and experimentally verify a predictive mathematical model of dynamic task 
allocation for MRS using emergent coordination. Such a formal model of task 
allocation is a positive step in the direction of placing the design of MRS on a 
formal footing. 

The rest of the paper is organized as follows. Section |21 provides a summary 
of related work. In Section we describe a general mechanism for task alloca- 
tion in dynamic environments. This is a distributed mechanism based on local 
sensing. In Section^ we present a mathematical model of the collective behav- 
ior of an MRS using this mechanism and study its performance under a variety 
of conditions. We validate the model in a multi- foraging domain. In Section |21 
we define the experimental task domain of multi-foraging, robot controllers and 
the simulation environment. In Section |H1 we compare the predictions of math- 
ematical models with the results from sensor-based simulations. We conclude 
the paper in Section 13 with a discussion of the approach and the results. 

2 Related Work 

Mathematical modeling and analysis of the collective behavior of MRS is a 
relatively new field with approaches and methodologies borrowed from other 
fields, including mathematics, physics, and biology. Recently, a number of re- 
searchers attempted to mathematically analyze multi-robot systems by using 
phenomenological models of the type present here. Sugawara et al. [23 1^ 
developed a simple model of cooperative foraging in groups of communicating 
and non-communicating robots. Kazadi et al. |11| studied the general prop- 
erties of multi-robot aggregation using phenomenological macroscopic models. 
Agassounon and Martinoli P presented a model of aggregation in which the 
number of robots taking part in the clustering task is based on the division of 
labor mechanism in ants. These models are ad-hoc and domain specific, and the 
authors give no explanation as to how to apply such models to other domain. In 
earlier works we have developed a general framework for creating phenomeno- 
logical models of collective behavior in groups of robots jEl^l- We applied this 
framework to study collaborative stick-pulling in a group of reactive robots |17| 
and foraging in robots ^21 ■ 

Most of the approaches listed above are implicitly or explicitly based on 
stochastic processes theory. Another example of the stochastic approach is the 
probabilistic microscopic model developed by Martinoli and coworkers |19[l2Ul l5| 
to study collective behavior of a group of robots. Rather than compute the 
exact trajectories and sensory information of individual robots, Martinoli et al. 
model each robot's interactions with other robots and the environment as a 
series of stochastic events, with probabilities determined by simple geometric 
considerations. Running several series of stochastic events in parallel, one for 
each robot, allowed them to study the group behavior of the multi-robot system. 

So far very little work has been done on mathematical analysis of multi-robot 
systems in dynamic environments. We have recently extended |14| the stochas- 
tic processes framework developed in earlier work to robots that change their 



behavior based on history of local observations of the (possibly changing) envi- 
ronment [2]- In the current paper we develop these ideas further, and present 
the exact stochastic model of the system, in addition to the phenomenological 
model. 

Closest to ours is the work of Huberman and Hogg |7j , who mathematically 
studied collective behavior of a system of adaptive agents using game dynamics 
as a mechanism for adaptation. In game dynamical systems, winning strategies 
are rewarded, and agents use the best performing strategies to decide their next 
move. Although their adaptation mechanism is different from our dynamic 
task allocation mechanism, their analytic approach is similar to ours, in that 
it is based on the theory of stochastic processes. Others have mathematically 
studied collective behavior of systems composed of large numbers of concurrent 
learners |25[I22| . These are microscopic models, which only allow one to study 
collective behavior of relatively small systems of a few robots. We are interested 
in macroscopic approaches that enable us to directly study collective behavior 
in large systems. Our work differs from earlier ones in another important way: 
we systematically compare theoretical predictions of mathematical models with 
results of experiments carried out in a sensor-based simulator. 

3 Dynamic Task Allocation Mechanism 

The dynamic task allocation scenario we study considers a world populated with 
tasks of T different types and robots that are equally capable of performing each 
task but can only be assigned to one type at any given time. For example, the 
tasks could be targets of different priority that have to be tracked, different 
types of explosives that need to be located, etc. Additionally, a robot cannot be 
idle — each robot is always performing a task at any given time. We introduce 
the notion of a robot state as a shorthand for the type of task the robot is 
assigned to service. A robot may switch its state according to its control policy 
when it determines it is appropriate to do so. However, needlessly switching 
tasks is to be avoided, since in physical robot systems, this can involve complex 
physical movement that requires time to perform. 

The purpose of task allocation is to assign robots to tasks in a way that 
will enhance the performance of the system, which typically means reducing 
the overall execution time. Thus, if all tasks take an equal amount of time to 
complete, in the best allocation, the fraction of robots in state i will be equal 
to the fraction of tasks of type i. In general, however, the desired allocation 
could take other forms — for example, it could be related to the relative reward 
or cost of completing each task type — without change to our approach. In 
the dynamic task allocation scenario, the number of tasks and the number of 
available robots are allowed to change over time, for example, by adding new 
tasks, deploying new robots, or removing malfunctioning robots. 

The challenge faced by the designer is to devise a mechanism that will lead to 
a desired task allocation in a distributed MRS even as the environment changes. 
The challenge is made even more difficult by the fact that robots have limited 



sensing capabilities, do not directly communicate with other robots, and there- 
fore, cannot acquire global information about the state of the world, the initial 
or current number of tasks (total or by type), or the initial or current number 
of robots (total or by assigned type). Instead, robots can sample the world 
(assumed to be finite) — for example, by moving around and making local ob- 
servations of the environment. We assume that robots are able to observe tasks 
and discriminate their types. They may also be able to observe and discriminate 
the task states of other robots. 

One way to give the robot an ability to respond to environmental changes 
(including actions of other robots) is to give a robot an internal state where it can 
store its knowledge of the environment as captured by its observations p[Tl|. 
The observations are stored in a rolling history window of finite length, with new 
observations replacing the oldest ones. The robot consults these observations 
periodically and updates its task state according to some transition function 
specified by the designer. In an earlier work we showed [HI El that this simple 
dynamic task allocation mechanism leads to the desired task allocation in a 
multi- for aging scenario. 

In the following sections we present a mathematical model of dynamic task 
allocation and study the role that transition function and the number of obser- 
vations (history length) play in the performance of a multi- foraging MRS. In 
Section ini we present a model of a simple scenario in which robots base their 
decisions to change state solely on observations of tasks in the environment. We 
study the simplest form of the transition function, in which the probability to 
change state to some type is proportional to the fraction of existing tasks of 
that type. In Section ffi.ll we compare theoretical predictions with no adjustable 
parameters to experimental data and find excellent agreement. In Section 14.21 
we examine the more complex scenario where the robots base their decisions 
to change task state on the observations of both existing task types and task 
states of other robots. In Section 16.21 we study the consequences of the choice 
of the transition function and history length on the system behavior and find 
good agreement with the experimental data. 

4 Analysis of Dynamic Task Allocation 

As proposed in the previous section, a robot may be able to adapt to a changing 
environment in the absence of complete global knowledge if it is able to make 
and remember local observations of the environment. In the treatment below 
we assume that there are two types of tasks — arbitrarily referred to as Red 
and Green. This simplification is for pedagogical reason only; the model can 
be extended to a greater number of task types. 

During a sufficiently short time interval, each robot can be considered to 
belong to the Green or Red task state. This is a very high level, coarse-grained 
description. In reality, each state is composed of several robot actions and 
behaviors, for example, searching for new tasks, detecting and executing them, 
avoiding obstacles, etc. However, since we want the model to capture how the 



fraction of robots in each task state evolves in time, it is a sufficient level of 
abstraction to consider only these two states. If we find that additional levels 
of detail are required to explain system behavior, we can elaborate the model 
by breaking each of the high level states into its underlying components. 

4.1 Observations of Tasks Only 

In this section we study dynamic task allocation mechanism in which robots 
make decisions to switch task states based solely on observations of available 
tasks. Let rrir and rUg be the numbers of the observed Red and Green tasks, 
respectively, in a robot's memory or history window. The robot chooses to 
change its state, or the type of task it is assigned to execute, with probabilities 
given by transition functions /g^rC'^irj^^g) (probability of switching to Red 
from Green) and fr^g{mr, nig) (probability of switching to Green from Red). 
We would like to define transition rules so that the fraction of time the robot 
spends in the Red (Green) state be equal to the fraction of Red (Green) tasks. 
This will assure that on average the number of Red and Green robots reflect 
the desired task distribution. Clearly, if the robots have global knowledge about 
the numbers of Red and Green tasks Mr and Mg , then each robot could choose 
each state with probability equal to the fraction of the tasks of corresponding 
type. Such global knowledge is not available; hence, we want to investigate 
how incomplete knowledge of the environment (through local observations), as 
well as the dynamically changing environment (e.g., changing ratio of Red and 
Green tasks), affects task allocation. 

4.1.1 Modelling Robot Observations 

As explained above, the transition rate between task execution states depends 
on robot's observations stored in its history. In our model we assume that a 
robot makes an observation of a task with a time period r. For simplicity, by 
an observation we mean here detecting a task, such as a target to be monitored, 
mine to be cleared or an object to be gathered. Therefore, observation history of 
length h comprises of the number of Red and Green tasks a robot has observed 
during a time interval hr. We assume that r has unit length and drop it. 
The process of observing a task is given by a Poisson distribution with rate 
A — aM", where a is a constant characterizing the physical parameters of the 
robot such as its speed, view angles, etc., and M° is the number of tasks in the 
environment. This simplification is based on the idea that robot's interactions 
with other robots and the environment are independent of the robot's actual 
trajectory and are governed by probabilities determined by simple geometric 
considerations. This simplification has been shown to produce remarkably good 
agreements with experiments J2(JI |S] . 

Let Mr{t) and Mg{t) be the number of Red and Green tasks respectively 
(can be time dependent), and let M(t) — Mr(t) + Mg(t) be the total number of 
tasks. The probability that in the time interval [t — h, t] the robot has observed 



exactly rrir and rrig tasks is the product of two Poisson distributions 

mrlmg 



nmr,mg) = ^—S-e-'^-'^ (1) 



where Xi , i = r, g, are the means of the respective distributions. If the task 
distribution does not change in time, A^ = aMih. For time dependent task 
distributions, A^ = a ^^_^dt' Mi[t'). 

4.1.2 Individual Dynamics: The Stochastic Master Equation 

Let us consider a single robot that has to decide between executing Red and 
Green tasks in a closed arena and makes a transition to Red and Green states 
according to its observations. Let Pr{t) be the probability that a robot is in the 
Red state at time t. The equation governing its evolution is 

-^ ^e{l~ Pr)fg^r ' SPrfr^g (2) 

where e is the rate at which the robot makes decisions to switch its state, 
and fg—>r and fr^g are the corresponding transitions probabilities between the 
states. As explained above, these probabilities depend on the robot's history — 
the number of tasks of either type it has observed during the time interval h pre- 
ceding the transition. If the robots have global knowledge about the numbers 
of Red and Green tasks M^ and Alg, one could choose the transition proba- 
bilities as the fraction of tasks of corresponding type, fg^r ^ Mr/{Mr + Mg) 
and fr^g oc Mg/{Mr + Mg). In the case when the global information is not 
available, it is natural to use similar transition probabilities using robots' local 
estimates: 

fg^r{mr,mg) = -^ EE jr{mr,mg) (3) 

rrir + nig 

Tfl 

f.r^g{nir,mg) = = '-fg{mr,nig) (4) 

mr + nig 

Note that 7r(r7ir, rUg) + jg{mr, mg) = 1 whenever rur + nig > 0, e.g., whenever 
there is at least one observation in the history window. In the case when there 
are no observations in history, nir = nig ~ 0, robots will choose either state with 
probability 1/2 as it follows from taking the appropriate limits in Equations O 
and^ Hence, we supplement Equation|31with /g_r(0,0) = /r^g(0, 0) = (and 
similarly for Equation 0)) to assure that robots do not change their state when 
the history window does not contain any observations. 

Equation|21 together with the transition rates shown in Equations|3}^ deter- 
mines the evolution of the probability density of a robot's state. It is a stochastic 
equation since the coefficients (transition rates) depend on random variables nrir 
and mg. Moreover, since the robot's history changes gradually, the values of the 
coefficients at different times are correlated, hence making the exact treatment 
very difficult. We propose, instead, to study the problem within the annealed 



approximation: we neglect time-correlation between robot's histories at differ- 
ent times, assuming instead that at any time the real history {mr,mg} can 
be replaced by a random one drawn from the Poisson distribution Equation ^ 
Next, we average Equation [3 over all histories to obtain 

-J^ = SlrC^ - nr) ~ e^gUr (5) 

Here 7^ and 7^ are given by 

^r = EPi^^9)^,% - E^(-'.9)7T^ (6) 

r,g ^ r,g 

where P{mr,mg) is the Poisson distribution Equation^ and the summation 
excludes the term r ~ g = {). Note that if the distribution of tasks changes in 
time, then 7^ ^ are time-dependent, 7 = 7r_g(t). 

To proceed further, we need to evaluate the summations in Equationl^l Let 
us define an auxiliary function 

F{x)^ y y a:"''-+"«^^-^e-^-e-^^ "''' (7) 

nir—Om,g—0 

It is easy to check that 7^ are given by 

% = l-F(l)-ie"''^^° (8) 

Differentiating Equation [7| with respect to x yields 

.p 00 00 \mr\"^s 

dx ^L^ ^-^ mJmJ '' ^ ^ 



m^ — 1 mcj— 



„g. 



Note that the summation over rrir starts from rrir = 1. Clearly, the sums over rrir 
and TTig are de-coupled thanks to the cancellation of the denominator {rrir+mg): 

? - f^-^'- E ^--'^rn.) fe-. V i^) (10) 



The resulting sums are evaluated easily (as the Taylor expansion of correspond- 
ing exponential functions), and the results is 

^=A,e-^«(i--) (11) 

where Aq = A,, + Xg. After elementary integration of Equation 1111 (subject to 
the condition F(0) = 1/2), we obtain using Equation |51 and the expressions for 
Xr, Aq: 

lr,g{t) = 7 / dt'^,rAt') (12) 



Here fJ.r,git) — Mr^g{t)/Mo are the fraction of Red and Green tasks respectively. 
Let us first consider the case when the task distribution does not change 
with time, i.e., iJ.r(t) = ^q. Then we have 

7.,,(i)-(l-e-"''*^«V°, (13) 

The solution of Equation subject to the initial condition pr-{t — Q) — pq is 
readily obtained: 



Pr{t) = ^^ + po - ^^ e-(7,.+7,;t (^4) 

V Ir+lgJ 

One can see that the probability distribution approaches the desired steady 
state value p^ — jjl^ exponentially. Also, the coefficient of the exponent depends 
on the density of tasks and the length of the history window. Indeed, it is 
easy to check that % + 7„ = 1 — e~"''*^° . Hence, for large enough Afo and 
/i, ahMo ;2> 1, the convergence rate is determined solely by e. For a small 
task density or short history length, on the other hand, the convergence rate is 
proportional to the number of tasks, £(1 — e~°'^^^°) ~ eahMo. Note that this is 
a direct consequence of the rule that robots do not change their state whenever 
there are no observation in the history window. 

Now let us consider the case where the task distribution changes suddenly 
at time tg, ^r{t) = /^J! + A^0{t — tg), where 0{t — to) is the step function. 
For simplicity, let us assume that ahMo ^ 1 so that the exponential term in 
Equation^lcan be neglected, 

1 /■* 

lr,g{t) = T dt'^^rAt'),%it)+-fg^l (15) 

" Jt-h 

Replacing Eauation llSl into Equation|Sl and solving the resulting differential 
equation yields 

Mt) - A*^ + ^t-^(l-e-*), t<h 

Pr{t) = Ai° + AAi-^(e-^(*-'')-e-^*), t>h. (16) 

Equation 1161 describes how the robot distribution converges to the new steady 
state value after the change in task distribution. Clearly, the convergence prop- 
erties of the solutions depend on h and e. It is easy to see that in the limiting 
case e/i ^ 1 the new steady state is attained after time h, \pr{h) — (/io + A/i)| ~ 
All/ (eh) <C 1, so the convergence time is tconv ~ h. In the other limiting case 
eh <^ 1, on the other hand, the situation is different. A simple analysis of 
Equation 1161 for t > h yields \pr{t) — (/io + ^m)I ^ A/ie~^* so the convergence 
is exponential with characteristic time tconv ~ 1/^- 

4.1.3 Collective Behavior 

In order to make predictions about the behavior of an MRS using a dynamic 
task allocation mechanism, we need to develop a mathematical model of the 



collective behavior of the system. In the previous section we derived a model of 
how an individual robot's behavior changes in time. In this section we extend 
it to model the behavior of a MRS. In particular, we study the collective behav- 
ior of a homogenous system consisting of N robots with identical controllers. 
Mathematically, the MRS is described by a probability density function that 
includes the states of all N robots. However, in most cases we are interested 
in studying the evolution of global, or average, quantities, such as the average 
number of robots in the Red state, rather than the exact probability density 
function. This applies when comparing theoretical predictions with results of 
experiments, which are usually quoted as an average over many experiments. 
Since the robots in either state are independent of each other, Prit), is now 
the fraction of robots in the Red state, and consequently Npr-{t) is the average 
number of robots in that state. The results of the previous section, namely 
solutions for pr (t) for constant task distribution (Equation I14|l and for chang- 
ing task distribution (Equation EJ), can be used to study the average collective 
behavior. Section IHTI presents results of analysis of the mathematical model. 

4.1.4 Stochastic Effects 

In some cases it is useful to know the probability distribution of robot task 
states over the entire MRS. This probability function describes the exact col- 
lective behavior from which one could derive the average behavior as well as 
the fluctuations around the average. Knowing the strength of fluctuations is 
necessary for assessing how the probabilistic nature of robot's observations and 
actions affects the global properties of the system. Below we consider the prob- 
lem of finding the probability distribution of the collective state of the system. 
Let P„ (t) be the probability that there are exactly n robots in the Red state 
at time t. For a sufficiently short time interval At we can write |15| 

Pnit + At)=J2Wn'nit]At)P„,{t)-Y,Wnn'{t;At)Pnit) (17) 

n' n' 

where Wij (t; At) is the transition probability between the states i and j during 
the time interval (t, t + At). In our MRS, this transitions correspond to robots 
changing their state from Red to Green or vice versa. Since the probability 
that more than one robot will have a transition during a time interval At is 
0(At), then, in the limit At — > only transitions between neighboring states 
are allowed in Eauation llTI n —^ n ±1. Hence, we obtain 

dP 

-^ = r„+iP„+i(t) + gn-iPn-i{t) - (r„ + gn)Pn{t) ■ (18) 

dt 

Here r^ is the probability density of having one of the k Red robots change its 
state to Green, and gk is the probability density of having one of the N — k 
Green robots change its state to Red. Let us assume again that ahMo ::^ 1 so 
that 7g = 1 — 7r- Then one has 

rfc = fc(l-%) , 5fe = (iV-fc)7, (19) 



with ro =5-1 =0, tat+i = gN = 0- Ir i^ history-averaged transition rate to 
ReA states. 

The steady state solution of Equation ^1 is given by JOI 

r„r„_i...r2ri 
where Pq is determined by the normahzation: 

g„_i5„-2---5i5o 



ps 



jv --1 



1 + E 



rnrn-i...r2ri 
?i— 1 



(21) 



Using the expression for 7, after some algebra we obtain 



P" = — 7"d -7 )^"" f22) 

e.g., the steady state is a binomial distribution with parameter 7. Note again 
that this is a direct consequence of the independence of the robots' dynamics. 
Indeed, since the robots act independently, in the steady state each robot has 
the same probability of being in either state. Moreover, using this argument it 
becomes clear that the time-dependent probability distribution Pn{t) is given 
by Equation 1221 with 7 replaced by Pr{t), Eauationll4l 

4.2 Observations of Tasks and Robots 

In this section we study the more complex dynamic task allocation mechanism 
in which robots make decisions to change their state based on the observations 
of not only available tasks but also on the observed task states of other robots. 
Specifically, each robot now records the numbers and types of task as well as 
the numbers and task types of robots it has encountered. Again, we let rrir 
and Trig be the number of tasks of Red and Green type, and n^ and Ug be the 
number of robots in Red and Green task state in a robot's history window. 
The probabilities for changing a robot's state are again given by transition 
functions that now depend on the fractions of observed tasks and robots of 
each type: rhr — mr/{mr + nig), rhg — mg/[mr. + nig), fir — nr/{nr + ng), 
and fig = ng/ijir + ng). In our previous work '15' we showed that in order to 
achieve the desired long term behavior for task allocation {i.e., in the steady 
state the average fraction of Red and Green robots is equal to the fraction of 
Red and Green tasks respectively), the transition rates must have the following 
functional form: 

fg^r{fhr,flr) = fhrg{nir - fir) , (23) 

fr^g{mr,fir) = fhgg(fhg - fig) = (1 - nir)g{~fhr + fir). (24) 

Here g{z) is a continuous, monotonically increasing function of its argument 
defined on an interval [—1, 1]. In this paper we consider the following forms for 



• Power: g(z) = lOOVlOO 

• Stepwise linear: g{z) = zQ{z)} 

To analyze this task allocation model, let us again consider a single robot 
that searches for tasks to perform and makes a transition to Red and Green 
states according to transition functions defined above. Let Pr{t) be the proba- 
bility that the robot is in the Red state at time i, with Equation |2| governing 
its time evolution. Note that Pr{t) is also the average fraction of Red robots, 

Pr{t) = Nr{t)/N. 

As in the previous case, the next step of the analysis is averaging over the 
the robot's histories, i.e., rhr and fir- Note that a robot's observations of avail- 
able tasks can still be modeled by a Poisson distribution similar to Equation ^ 
However, since the number of robots of each task state changes stochastically 
in time, the statistics of n^ and Ug should be modeled as a doubly stochas- 
tic Poisson process (also called Cox process) with stochastic rates. This would 
complicate the calculation of the average over n^ — nr/{nr + rig) and require 
mathematical details that go well beyond the scope of this paper. Fortunately, 
as we demonstrated in the previous section, if a robot's observation window 
contains many readings, then the estimated fraction of task types is exponen- 
tially close to the average of the Poisson distribution. This suggests that for 
sufficiently high densities of tasks and robots we can neglect the stochastic ef- 
fects of modeling observations for the purpose of our analysis, and replace the 
robot's observation by their average (expected) values. In other words, we use 
the following approximation: 

1 /■* 

fir Ki - pr{t')dt' (25) 

" Jt-h 

1 /•* 

rhr « — / iir(t')dt' . (26) 

" Jt-h 

The Equations [3 EHl ^^d I^Hl are a system of integro-differential equations 
that uniquely determine the dynamics oipr{t). In the most general case it is 
not possible to obtain solutions by analytical means, hence one has to solve the 
system numerically. However, if the task density does not change in time, we 
can still perform steady state analysis. Steady state analysis looks for long-term 
solutions that do not change in time, i.e., dp^/dt = 0. Let /ijl be the density 
of Red tasks, and pq = Pr{t — > cxo) be the steady state value, so that thr = ^J!, 
fir — Pr- Then, by setting left hand side of Equation [5] to zero, we get 

(1 - Po)Ai°<?(Ai° - Po) = Po(l - /i.°)g(-Mr + Po) (27) 

Note that po = /i° is a solution to Equation 1271 so that in the steady state 
the fraction of Red robots equals the fraction of red tasks as desired. To show 
that this is the only solution, we note that for a fixed /i° the right- and left-hand 

^The step function © is defined as &{z) = 1 if z > 0; otherwise, it is 0. The step function 
guarantees that no transitions to Red state occur when nir < n^. 



sides of the equation are monotonically increasing and decreasing functions of 
Po respectively, due to the monotonicity of g{z). Consequently, the two curves 
can meet only once and that proves the uniqueness of the solution. 

4.2.1 Phenomenological Model 

Exact stochastic models of task allocation can quickly become analytically in- 
tractable, as we saw above. Instead of exact models, it is often more conve- 
nient to work with the so-called Rate Equations model. These equations can 
be derived from the exact stochastic model by appropriately averaging it jl5) : 
however, they are often (see, for example, population dynamics 6 ) phenomeno- 
logical, or ad hoc, in nature — constructed by taking into account the system's 
salient processes. This approach makes a number of simplifying assumptions: 
namely, that the system is uniform and dilute (not too dense), that actions of 
individual entities are independent of one another, that parameters can be rep- 
resented by their mean values and that system behavior can be described by its 
average value. Despite these simplifications, resulting models have been shown 
to correctly describe dynamics of collective behavior of robotic systems |18) . 
Phenomenological models are useful for answering many important questions 
about the performance of a MRS, such as, does the steady state exist, how long 
does it take to reach it, and so on. Below we present a phenomenological model 
of dynamic task allocation. 

Individual robots are making their decisions to change task state probabilis- 
tically and independently of one another. A robot will change state from Green 
to Red with probability fg^r and with probability 1 — fg~^r it will remain in 
the Green state. We can succinctly write ANg^r and ANr^g, the number of 
robots that switch from Green to Red and vice versa during a sufficiently small 
time interval At, as 



1=1 

N,. 

ANr^g = ^(l-a;,)(/,^<,(5(a;,) + (l-/,^3)5(a;, -1)) 



Here we introduced a state variable Xi , such that Xi — 1 when a robot is in the 
Green state, and Xi = when a robot is in the Red state. S{x) is Kronecker 
delta, defined as 6{x) = 1 when x = and S{x) = otherwise. Therefore, 
ANg^r is a random variable from a binomial distribution specified by a mean 
/x = fg^rNg and variance a^ = fg^ri^ — fg^r)Ng. Similarly, the distribution 
of the random variable AN^^g is specified by mean ^ = fr^gNr and variance 

a2 = fr^gil - fr^g)Nr. 

During a time interval Ai the total number of robots in Red and Green task 
states will change as individual robots make decisions to change states. The 
following finite difference equation specifies how the number of Red will change 



on average: 

Nr{t + At) = Nr{t) + eANg^rAt - sANr^gAt (28) 

Rearranging the equation and taking the continuous time hmit {At -^ 0) yields 
a differential Rate Equation that describes time evolution of the number of Red 
robots. By taking the means of AA^'s as their values, we recover Equation |21 

Keeping AiV's as random variables allows us to study the effect the prob- 
abilistic nature of the robots' decisions have on the collective behavior.^ We 
solve Equation 1281 bv iterating it in time and drawing AA^'s at random from 
their respective distributions. The solutions are subject to the initial condition 
Nr(t <0)=N and specify the dynamics of task allocation in robots. 

Functions fg^r smd fr^g are calculated using estimates of the densities of 
Red tasks (771^) and robots in Red state (n^) from the observed counts stored 
in the robot's history window. 

Transition rates fg^r and fr~,g in the model are mean values, averaged over 
all histories and all robots. In order to compute them, we need to aggregate 
observations of all robots. Suppose each robot has a history window of length 
h. For a particular robot i, the values in the most recent observational slot are 
N^^, N^ , Mf^ and Af? , the observed numbers of Red and Green robots and 
tasks respectively at time t. In the next latest slot, the values are N}^, ^l g^ 
Ml J. and Mfg, the observed numbers at time t — A, and so on. Each robot 
estimates the densities of Red robots and tasks using the following calculation: 

h-i j^j h-i 

j=Q ^^t,r ^ ^^i,g j=0 

When observations of all robots are taken into account, the mean of the 
observed densities of Red robots at time t — -^ Si=i '^?r — '^i^^ fluctuate due 
to observation noise, but on average it will be proportional to Nr{t)/N, which 
is the actual density of Red robots at time t. The proportionality factor is 
related to physical robot parameters, such as speed and observation area (see 
Section |OJ. Likewise, the average of the observed densities at time t — jA is 
Jj J2i=i ''^ir oc iVr(t — jA)/N, the density of robots at time t — jA. Thus, the 
aggregate estimates of the fractions of Red robots and tasks are: 

N /i-l 

^- = NT.'''^'- = NhT.^rit~3A) (31) 

1=1 j=0 

N h-1 

i=l j=0 



^Note that we do not model here the effect of observation noise due to uncertainty in sensor 
readings and fluctuations in the distribution of tasks. 



Robots are making their decisions asynchronously, i.e., at sHghtly different 
times. Therefore, the last terms in the above equations are best expressed in 
continuous form: e.g., I/Nhj^^ Nr{t — T)dT (see Equation 123 and Equation !^ . 

Estimates Equation 1^ and 1^ can be plugged into Equation 1^ and Equa- 
tion 131 to compute the values of transition probabilities for any choice of the 
transition function (power or linear). Once we know fr—>g and fg^r, we can 
solve Equation I2H1 to study the dynamics of task allocation in robots. Note that 
Equation 1281 is now a time-delay finite difference equation, and solutions will 
show typical oscillations. 

We solve the models presented in this section and validate their predictions 
in context of the multi- foraging task described next. 

5 Multi-Robot Multi-Foraging Task 

In this section we describe the multi-foraging task domain in which we exper- 
imentally tested our dynamic task allocation mechanism, including the simu- 
lation environment used and robot sensing and control characteristics. In Sec- 
tion |^2 we use this application to validate the models presented above, solve 
them and compare their solutions to the results of embodied simulations. 

5.1 Task Description 

The traditional foraging task is defined by having an individual robot or group 
of robots collect a set of objects from an environment and either consume on 
the spot or return them to a common location [Hj. Multi-foraging, a variation 
on traditional foraging, is defined in P] and consists of an arena populated by 
multiple types of objects to be concurrently collected. 

In our multi- foraging domain, there are two types of objects (e.g., pucks) 
randomly dispersed throughout the arena: Puck^ed and Puckfj^een pucks that 
are distinguishable by their color. Each robot is equally capable of foraging 
both puck types, but can only be allocated to foraging for one type at any 
given time. Additionally, all robots are engaged in foraging at all times; a robot 
cannot be idle. A robot may switch the puck type for which it is foraging 
according to its control policy, when it determines it is appropriate to do so. 
This is an instantiation of the general task allocation problem described earlier 
in this paper, with puck colors representing different task types. 

In the multi-foraging task, the robots move in an enclosed arena and pick up 
encountered pucks. When a robot picks up a puck, the puck is consumed (i.e., 
it is immediately removed from the environment, not transported to another 
region) and the robot carries on foraging for other pucks. Immediately after 
a puck is consumed, another puck of the same type is placed in the arena at 
a random location. This is done so as to maintain a constant puck density 
in the arena throughout the course of an experiment. In some situations, the 
density of pucks can impact the accuracy or speed of convergence to the desired 
task allocation. This is an important consideration in dynamic task allocation 



mechanisms for many domains; however, in this work we want to Hmit the 
number of experimental variables impacting system performance. Therefore, 
we reserve the investigation on the impact of varying puck densities for future 
work. 

The role of dynamic task allocation in this domain requires the robots to 
split their numbers by having some forage for Puck/jerf pucks and others for 
Puck-Green pucks. For the purpose of our experiments, we desire an allocation 
of robots to converge to a situation in which the proportion of robots foraging 
for Puckfled pucks is equal to the proportion of Puck/je^ pucks present in the 
foraging arena (e.g., if Fuck fi^d pucks make up 30% of the pucks present in the 
foraging arena, then 30% of the robots should be foraging for Puck^ed pucks). 
In general, the desired allocation could take other forms. For example, it could 
be related to the relative reward or cost of foraging each puck type without 
change to our approach. 

We note that the limited sensing capabilities and lack of direct communica- 
tion of the individual robots in the implementation of our task domain prohibits 
them from acquiring global information such as the size and shape of the for- 
aging arena, the initial or current number of pucks to be foraged (total or by 
type) , or the initial or current number of foraging robots (total or by foraging 
type). 

5.2 Simulation Environment 

In order to experimentally demonstrate the dynamic task allocation mecha- 
nism we made use of a physically-realistic simulation environment. Our simu- 
lation trials were performed using Player and Gazebo simulation environments. 
Player Pj is a server that connects robots, sensors, and control programs over 
a network. Gazebo ^TT simulates a set of Player devices in a 3-D physically- 
realistic world with full dynamics. Together, the two represent a high-fidelity 
simulation tool for individual robots and teams that has been validated on a 
collection of real-robot robot experiments using Player control programs trans- 
ferred directly to physical mobile robots. Figure ^ provides snapshots of the 
simulation environment used. All experiments involved 20 robots foraging in a 
400m^ arena. 

The robots used in the experimental simulations are realistic models of the 
ActivMedia Pioneer 2DX mobile robot. Each robot, approximately 30 cm in 
diameter, is equipped with a differential drive, an odometry system using wheel 
rotation encoders, and 180 degree forward-facing laser rangefinder used for ob- 
stacle avoidance and as a fiducial detector/reader. Each puck is marked with 
a fiducial that marks the puck type and each robot is equipped with a fiducial 
that marks the active foraging state of the robot. Note that the fiducials do not 
contain unique identities of the pucks or robots but only mark the type of the 
puck or the puck type a given robot is engaged in foraging. Each robot is also 
equipped with a 2-DOF gripper on the front, capable of picking up a single 8 
cm diameter puck at a time. There is no capability available for explicit, di- 
rect communication between robots nor can pucks and other robots be uniquely 




Figure 1: Snapshots from the shiiulation environment used, (left) An overhead 
view of foraging arena and robots, (right) A closeup of robots and pucks. 



identified. 



5.3 Behavior-Based Robot Controller 

All robots have identical behavior-based controllers consisting of the following 
mutually exclusive behaviors: Avoiding, Wandering, Puck Servoing, Grasping, 
and Observing. Descriptions of robot behaviors are provided below. 

- The Avoiding behavior causes the robot to turn to avoid obstacles in its 
path. 

- The Wandering behavior causes the robot to move forward and, after a 
random length of elapsed time, to turn left or right through a random arc 
for a random period of time. 

- The Puck Servoing behavior causes the robot to move toward a detected 
puck of the desired type. If the robot's current foraging state is Roboti^^e^, 
the desired puck type is Puckflerf, and if the robots current foraging state 
is Robotcreen, the dcsircd puck type is Puckcreen- 

- The Grasping behavior causes the robot to use its gripper to pick up and 
consume a puck within the gripper's grasp. 

- The Observing behavior causes the robot to take the current fiducial 
information returned by the laser rangefinder and record the detected 
pucks and robots to their respective histories. The robot then updates 
its foraging state based on those histories. A description of the histories 
is given in Section 15.3.11 and a description of the foraging state update 
procedure is given in Section 15.3.21 

Each behavior listed above has a set of activation conditions based on rel- 
evant sensor inputs and state values. When met, the conditions cause the be- 
havior to be become active. A description of when each activation condition is 
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Table 1: Behavior Activation Conditions. Behaviors are listed in order of de- 
creasing rank. Higher ranking behaviors preempt lower ranking behaviors in 
the event multiple are active. X denotes the activation condition is irrelevant 
for the behavior. 



active is given below. The activation conditions of all behaviors are shown in 
Table U] 

- The Obstacle Detected activation condition is true when an obstacle 
is detected by the laser rangefinder within a distance of 1 meter. Other 
robots, pucks, and the arena walls are considered obstacles. 

- The Puckoet Detected activation condition is true if the robot's current 
foraging state is Robotuet and a puck of type Puck^iet (where Det is Red 
or Green) is detected within a distance of 5 meters and within ± 30 degrees 
of the robot's direction of travel. 

- The Gripper Break-Beam On activation condition is true if the break- 
beam sensor between the gripper jaws detects an object. 

- The Observation Signal activation condition is true if the distance trav- 
eled by the robot according to odomctry since the last time the Observing 
behavior was activated is greater than 2 meters. 



5.3.1 Robot State Information 

All robots maintain three types of state information: foraging state, observed 
puck history, and observed robot history. The foraging state identifies the type 
of puck the robot is currently involved in foraging. A robot with a foraging 
state of Robot/jed refers to a robot engaged in foraging Fuckfied pucks and a 
foraging state of Kohotcreen refers to a robot engaged in foraging Puckcreen 
pucks. For simplicity, we will refer to both robot foraging states and puck types 
as Red and Green. The exact meaning will be clear in context. 

Each robot is outfitted with a colored beacon passively observable by nearby 
robots which indicates the robot's current foraging state. The color of the bea- 
con changes to reflect the current state - a red beacon for a foraging state of 
Red and a green beacon for foraging state Green. Thus, the colored beacon 
acts as a form of local, passive communication conveying the robot's current 



foraging state. All robots maintain a limited, constant-sized history storing the 
most recently observed puck types and another constant-sized history storing 
the foraging state of the most recently observed robots. Neither of these his- 
tories contains a unique identity or location of detected pucks or robots, nor 
does it store a time stamp of when any given observation was made. The his- 
tory of observed pucks is limited to the last MAX-PUCK-HISTORY pucks observed 
and the history of the foraging states of observed robots is limited to the last 
MAX-ROBDT-HISTORY robots observed. 

While moving about the arena, each robot keeps track of the approximate 
distance it has traveled by using odometry measurements. At every interval of 
2 meters traveled, the robot makes an observation performed by the Observing 
behavior. This procedure is nearly instantaneous; therefore, the robot's behavior 
is not outwardly affected. The area in which pucks and other robots are visible is 
within 5 meters and ± 30 degrees in the robot's direction of travel. Observations 
are only made after traveling 2 meters because updating too frequently leads 
to over-convergence of the estimated puck and robot type proportions due to 
repeated observations of the same pucks and/or robots. On average, during our 
experiments, a robot detected 2 pucks and robots per observation. 

5.3.2 Foraging State Transition Function 

After a robot makes an observation, it re-evaluates and probabilistically changes 
its current foraging state given the newly updated puck and robot histories. The 
probability by which the robot changes its foraging state is defined by the tran- 
sition function. We experimentally studied transition functions given by Equa- 
tion|21 Equation 1231 and Eauation l24l with both power and linear forms. Below 
we present results of analysis and simulations and discuss the consequences the 
choice of the transition function has on system level behavior. 

6 Analysis and Simulations Results 

The mathematical models developed in Section 01 can be directly applied to the 
multi-foraging task if we map Red and Green tasks to Red and Green pucks 
and task states of robots to their foraging states. Model parameters, such as 
e, a, etc, depend on physical realizations of the implementation and can be 
computed from details of the multi-foraging task as described below. 

6.1 Observations of Pucks Only 

First, we study the model of dynamic task allocation, presented in Section Wl\ 
where robots observe only pucks and make decision to switch foraging state ac- 
cording to the transition functions given by Equation [3| We compared theoret- 
ical predictions of the robots' collective behavior with results from simulations. 
We used Eauationll4landll6lto compute how the average number of robots in 
the Red state changes in time when the puck distribution is suddenly changed. 



The parameter values were obtained from experiments, po = 1-0 was the initial 
density of Red robots (of 20 total robots), ^o = 0.3 was the initial Red puck 
density (of 50 total pucks), which remained constant until it was changed by 
the experimenter. The first change in puck density was A/i = 0.5, meaning that 
80% of the pucks in the arena are now Red. The second change in puck density 
was A/z = —0.3, to 50% Red pucks. 

e is the rate at which robots make decisions to switch states. Robot traveled 
2 in between observations at an average speed of 0.2 m/s; therefore, there are 
10 s between observations, and e = 0.1. h, the history length, is the number 
of pucks in the robot's memory. aM° is the rate at which robots encounter 
pucks. A robot makes an observation of its local environment at discrete time 
intervals. The area visible to the robot is Ayis = (5 m)^TT/6 = 13.09, with 1/6 
coming from the 60° angle of view. The arena area is A = 315 m'^; therefore, 
aM° = AyisM^ /A — 2.1. We studied the dynamics of the system for different 
history lengths h. 
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Figure 2: Evolution of the fraction of Red robots for different history lengths. 
Robots' decision to change state is based on observations of pucks only. 



Figure |21 shows evolution of the numbers of Red robots for different history 
lengths. Initially, the distribution of Red pucks is set to 30% and all the robots 
are in the Red foraging state. At t = 500 s, the puck distribution changes 



abruptly to 80%, and at t = 1000 s to 50%. The solid line shows results of 
simulations — the fraction of Red robots, averaged over 10 runs. The dashed 
line gives theoretical predictions for the parameters quoted above. Since we 
are in the eh ^ 1 limit (for h — 50, 100), the time it takes to converge to the 
steady state is linear in history length, tconv ~ h, as predicted by Equation 1161 
The agreement between theoretical and experimental results is excellent. We 
stress that there are no free parameters in the theoretical predictions — only 
experimental values of the parameters were used in producing these plots. 

In addition to being able to predict the average collective behavior of the 
multi-robot system, we can also quantitatively characterize the amount of fluc- 
tuations in the system. Fluctuations are deviations from the steady state (after 
the system has converged to the steady state) that arise from the stochastic 
nature of robot's observations and decisions. These deviations result in fluctu- 
ations from the desired global distribution of Red and Green robots seen in an 
individual experiment. One can suppress these fluctuations by averaging results 
of many identical experiments. 




Figure 3: Histogram of the fraction of Red robots in the steady state for three 
different puck distributions (data for h — 10). /ig specifies fraction of Red pucks. 
Lines are theoretical predictions of the distribution of Red robots. 



To measure the strength of the fluctuations, we take data from an individual 
experimental run and extract the fraction of Red robots, after the system has 
converged to the steady state, for each of the three Red puck distributions: 
Ho — 30%, 50%, 80%. Because the runs were relatively short, we only have 
300 s worth of data (30 data points) in the converged state; however, since each 



experiment was repeated ten times, we make the data sets longer by appending 
data from all experiments. In the end, we have 300 measurements of the steady 
state Red robot density for three different puck distributions. Figure I^TI shows 
the histogram of robot distributions for three different puck distributions. The 
solid lines are computed using Equation|221 where for 7 we used the actual means 
of the steady state distributions (7 = 0.28, 0.47 and 0.7 for jjlq = 30%, 50% and 
80% respectively) . We can see from the plots that the theory correctly predicts 
the strength of fluctuations about the steady state. As is true of binomial 
distributions, the fluctuations (measured by the variance) are greatest for cases 
where the numbers of Red and Green pucks are comparable (/io = 50%) and 
smaller when their numbers are very different (^0 = 80%). 

6.2 Observations of Pucks and Robots 

In this section we study the dynamic task allocation model developed in Sec- 
tion ^^ in which robots use observations of pucks and other robots' foraging 
states to make decision to change their own foraging state. 

Figure 01 shows results of embodied simulations (solid lines) as well as so- 
lutions to the model Equation |2S1 (dashed lines) for different values of robot 
history length and forms of transition function (given by Equation 1231 and I24L 
with g{z) linear or power function). Initially, the Red puck fraction (dotted line) 
is 30%. It is changed abruptly at i = 500 s to 80% and then again sit — 2000 s 
to 50%. Each solid line showing Red robot density has been averaged over 10 
runs. We rescale the dimensionless time of the model by parameter 10, corre- 
sponding e = 0.1. The history length was the only adjustable parameter used in 
solving the equations. The values of h used to compute the observed fraction of 
Red robots Ur in Eauation l31l were h — 2, 8, 16, corresponding to experimental 
history lengths 10, 50, 100 respectively. For m^, the observed fraction of Red 
pucks, we used their actual densities. 

In order to explain the difference in history lengths between theory and ex- 
periment, we note that in the simulation experiments, the history length means 
the numbers of observed robots and pucks, while in the model, it means the 
number of observations, with multiple objects sighted within a single observa- 
tion. According to calculations in Section IfTTl a robot observes about 2 pucks 
in a single observation. Moreover, the robot travels 2 m between observations, 
yet it sees 5 m out during each observation, meaning that individual observa- 
tions will be correlated. Observations will be further correlated because of the 
pattern of a robot's motion — as the robot moves in a straight line towards 
a goal, it is likely to observe overlapping regions of the arena. These consid- 
erations could explain the factor of five difference between the history lengths 
used in the experiments and the corresponding values used in the model. More 
detailed experiments, for example, ones in which robots travel farther between 
observations, are necessary to explain these differences. 

Solutions exhibit oscillations, although eventually oscillations decay and so- 
lutions relax to their steady state values. In all cases, the steady state value is 
the same as the fraction of red pucks in the arena. History-induced oscillations 
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Figure 4: Evolution of the fraction of Red robots for different history lengths 
and transition functions, compared to predictions of the model 



are far more pronounced for the linear transitfon function (Figure EJa)) than for 
the power transition function (Figure 0Jb)). For the power transition function, 
these oscillations are present but become evident only for longer history lengths. 
This behavior is probably caused by the differences between the values of tran- 
sition functions near the steady state: while the value of the power transition 
function remains small near the steady state, the value of the linear transition 
function grows linearly with the distance from the steady state, thereby ampli- 
fying any deviations from the steady state solution. The amplitude and period 
of oscillations and the convergence rate of solutions to the steady state all de- 
pend on history length, and it generally takes longer to reach the steady state 
for longer histories. Another conclusion is that the linear transition function 
converges to the desired distribution faster than the power function, at least for 
moderate history lengths. 

7 Discussion 

We have constructed and analyzed mathematical models of dynamic task alloca- 
tion in a multi-robot system. The models are general and can be easily extended 
to other systems in which robots use a history of local observations of the envi- 
ronment as a basis for making decisions about future actions. These models are 
based on theory of stochastic processes. In order to study a robot's behavior, 
we do not need to know its exact trajectory or the trajectories of other robots; 
instead, we derive a probabilistic model that governs how a robot's behavior 
changes in time. In some simple cases these models can be solved analytically. 
However, stochastic models are usually too complex for exact analytic treat- 
ment. Thus, in the scenario described in Section im in which only observations 
of tasks are made, though the individual model is tractable, the stochastic model 
of the collective behavior is not. Instead, we use averaging and approximation 
techniques to quantitatively study the dynamics of the collective behavior. Such 
models, therefore, do not describe the robots' behavior in a single experiment, 
but rather the behavior that has been averaged over many experimental or sim- 
ulations runs. Fortunately, results of experiments and simulations are usually 
presented as an average over many runs; therefore, mathematical models of aver- 
age collective behavior can be used to describe experimental results. In fact, the 
stochastic model produces excellent agreement with experimental results under 
all experimental conditions and without using any adjustable parameters. 

Phenomenological models are more straightforward to construct and ana- 
lyze than exact stochastic models — in fact, they can be easily constructed 
from details of the individual robot controller JS| . The ease of use comes at a 
price, namely, the number of simplifying assumptions that were made in order 
to produce a mathematically tractable model. First, we assume that the robots 
are functioning in a dilute limit, where they are sufficiently separated that their 
actions are largely independent of one another. Second, we assume that the tran- 
sition rates can be represented by aggregate quantities that are spatially uniform 
and independent of the details of the individual robot's actions or history. We 



also assume the system is homogeneous, with modeled robots characterized by a 
set of parameters, each of them representing the mean value of some real robot 
feature: mean speed, mean duration for performing a certain maneuver, and so 
on. Real robot systems are heterogeneous: even if the robots are executing the 
same controller, there will always be variations due to inherent differences in 
hardware. We do not consider parameter distributions in our models as would 
be necessary to describe such heterogeneous systems. Finally, phenomenological 
models more reliably describe systems where fluctuations (deviations from the 
mean behavior) can be neglected, as happens in large systems or when many 
experimental runs are aggregated. However, even if phenomenological models 
don't agree with experiments exactly, as we saw in Section 16.21 they can still 
reliably predict most behaviors of interest even in not-so-large systems. They 
are, therefore, a useful tool for modeling and analyzing multi-robot systems. 

8 Conclusion 

Mathematical analysis can be a useful tool for the study and design of MRS 
and a viable alternative to experiments and simulations. It can be applied to 
large systems that are too costly to build or take too long to run in simula- 
tion. Mathematical analysis can be used to study the behavior of an MRS, 
select parameters that optimize its performance, prevent instabilities, etc. In 
conjunction with the design process, mathematical analysis can help under- 
stand the effect individual robot characteristics have on the collective behavior 
before a system is implemented in hardware or in simulation. Unlike experi- 
ments and simulations, where exhaustive search of the design parameter space 
is often required to reach any conclusion, analysis can often produce exact an- 
alytic results, or scaling relationships, for the quantities of interest. If these 
are not possible, exhaustive search of the parameter space is much more prac- 
tical and efficient. Finally, results of analysis can be used as feedback to guide 
performance-enhancing modifications of the robot controller. 

In this paper wc have described an dynamic task allocation mechanism 
where robots use local observations of the environment to decide their task 
assignments. We have presented a mathematical model of this task allocation 
mechanism and studied it in the context of a multi-foraging task scenario. We 
compared predictions of the model with results of embodied simulations and 
found excellent quantitative agreement. In this application, mathematical anal- 
ysis could help the designer choose robot properties, such as the form of the 
transition probability used by robots to switch their task state, or decide how 
many observations the robot ought to consider. 

Mathematical analysis of MRS is a new field, but its success in explaining 
experimental results shows it to be a promising tool for the design and analysis 
of robotic systems. The field is open to new research directions, from apply- 
ing analysis to new robotic systems to developing increasingly sophisticated 
mathematical models that, for example, account for heterogeneities in robot 
population that are due to differences in their sensors and actuators. 
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